Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDF based on existing function infra #4804

Merged
merged 8 commits into from
Feb 13, 2023

Conversation

zhaojunnana
Copy link
Contributor

@zhaojunnana zhaojunnana commented Oct 28, 2022

What type of PR is this?

  • bug
  • feature
  • enhancement

What problem(s) does this PR solve?

Issue(s) number:

#4793
Close #5337

Description:

Users can define functions by implementing GraphFunction

How do you solve it?

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Checklist:

Tests:

  • Unit test(positive and negative cases)
  • Function test
  • Performance test
  • N/A

Affects:

  • Documentation affected (Please add the label if documentation needs to be modified.)
    see UDF based on existing function infra #4804
  • Incompatibility (If it breaks the compatibility, please describe it and add the label.)
  • If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
  • Performance impacted: Consumes more CPU/Memory

Release notes:

Please confirm whether to be reflected in release notes and how to describe:

ex. Fixed the bug .....

@CLAassistant
Copy link

CLAassistant commented Oct 28, 2022

CLA assistant check
All committers have signed the CLA.

@xtcyclist xtcyclist added the ready-for-testing PR: ready for the CI test label Nov 1, 2022
@wey-gu wey-gu requested a review from a team as a code owner February 13, 2023 04:11
Co-authored-by: Wey Gu <[email protected]>
Co-authored-by: Cheng Xuntao <[email protected]>
@wey-gu
Copy link
Contributor

wey-gu commented Feb 13, 2023

@zhaojunnana I added an example of building the UDF, based on some downstream work from @xtcyclist , too.

Nice job, we tested it worked like a charm :)

@xtcyclist g++ used to enable redhat distributions to build, too.

@wey-gu
Copy link
Contributor

wey-gu commented Feb 13, 2023

Docs:


编译环境准备,参考

进入到代码仓库,定义编写 UDF

$ tree udf
udf
|-- Makefile
|-- standard_deviation.cpp
`-- standard_deviation.h

这里我们实现一个求给定数组的标准差的函数:

standard_deviation.h里的代码:

#ifndef UDF_PROJECT_STANDARD_DEVIATION_H
#define UDF_PROJECT_STANDARD_DEVIATION_H

#include "../src/common/function/GraphFunction.h"

class standard_deviation : public GraphFunction {
 public:
  char *name() override;

  std::vector<std::vector<nebula::Value::Type>> inputType() override;

  nebula::Value::Type returnType() override;

  size_t minArity() override;

  size_t maxArity() override;

  bool isPure() override;

  nebula::Value body(const std::vector<std::reference_wrapper<const nebula::Value>> &args) override;
};

#endif  // UDF_PROJECT_STANDARD_DEVIATION_H

standard_deviation.cpp里的代码:

/* Copyright (c) 2020 vesoft inc. All rights reserved.
 *
 * This source code is licensed under Apache 2.0 License.
 */

#include <cmath>
#include <vector>

#include "standard_deviation.h"

#include "../src/common/datatypes/List.h"
#include "../src/common/datatypes/Value.h"

extern "C" GraphFunction *create() {
  return new standard_deviation;
}
extern "C" void destroy(GraphFunction *function) {
  delete function;
}

char *standard_deviation::name() {
  const char *name = "standard_deviation";
  return const_cast<char *>(name);
}

std::vector<std::vector<nebula::Value::Type>> standard_deviation::inputType() {
  std::vector<nebula::Value::Type> vtp = {nebula::Value::Type::LIST};
  std::vector<std::vector<nebula::Value::Type>> vvtp = {vtp};
  return vvtp;
}

nebula::Value::Type standard_deviation::returnType() {
  return nebula::Value::Type::FLOAT;
}

size_t standard_deviation::minArity() {
  return 1;
}

size_t standard_deviation::maxArity() {
  return 1;
}

bool standard_deviation::isPure() {
  return true;
}

double standardDeviation(const std::vector<double> &numbers) {
    double sum = 0;
    for (double number : numbers) {
        sum += number;
    }
    double average = sum / numbers.size();

    double variance = 0;
    for (double number : numbers) {
        double difference = number - average;
        variance += difference * difference;
    }
    variance /= numbers.size();

    return sqrt(variance);
}

nebula::Value standard_deviation::body(
    const std::vector<std::reference_wrapper<const nebula::Value>> &args) {
  switch (args[0].get().type()) {
    case nebula::Value::Type::NULLVALUE: {
      return nebula::Value::kNullValue;
    }
    case nebula::Value::Type::LIST: {
      std::vector<double> numbers;
      auto list = args[0].get().getList();
      auto size = list.size();

      for (int i = 0; i < size; i++) {
        auto &value = list[i];
        if (value.isInt()) {
          numbers.push_back(value.getInt());
        } else if (value.isFloat()) {
          numbers.push_back(value.getFloat());
        } else {
          return nebula::Value::kNullValue;
        }
      }
      return nebula::Value(standardDeviation(numbers));
    }
    default: {
      return nebula::Value::kNullValue;
    }
  }
}

然后编译:

$ cd udf; make
clang++-10 ./standard_deviation.cpp -c -o standard_deviation.o -I ../src/ -fPIC -I ../build/third-party/install/include/
clang++-10 -shared -o standard_deviation.so standard_deviation.o

这时候,文件被编译成二进制 .so,如下:

$ tree .
.
|-- Makefile
|-- standard_deviation.cpp
|-- standard_deviation.h
|-- standard_deviation.o
`-- standard_deviation.so

加载 UDF 到 GraphD

假设我们把代码仓库放到 /home/foobar/dev/nebula/ 下边,udf 代码放在其中的 udf 目录,修改需要加载 UDF 的 graphd 配置

$ tail /usr/local/nebula/etc/nebula-graphd.conf -n 5

# enable udf, c++ only
--enable_udf=true
# set the directory where the .so of udf are stored
--udf_path=/home/foobar/dev/nebula/udf/

重启 graphd

sudo /usr/local/nebula/scripts/nebula.service restart graphd

测试 UDF

连接到重启的 GraphD, 测试 standard_deviation 函数:

$ nebula-console-3.0 -addr 127.0.0.1  -port 9669 -user root -p nebula

(root@nebula) [(none)]> yield standard_deviation([1,2,3])
+-----------------------------+
| standard_deviation([1,2,3]) |
+-----------------------------+
| 0.816496580927726           |
+-----------------------------+
Got 1 rows (time spent 9944/18471 us)

(root@nebula) [(none)]> yield standard_deviation([1,1,1])
+-----------------------------+
| standard_deviation([1,1,1]) |
+-----------------------------+
| 0.0                         |
+-----------------------------+
Got 1 rows (time spent 4559/12630 us)

Fri, 03 Feb 2023 16:11:24 CST

(root@nebula) [basketballplayer]> GO 1 TO 2 STEPS FROM "player100" OVER follow YIELD properties(edge).degree AS d | yield collect($-.d)
+--------------------------+
| collect($-.d)            |
+--------------------------+
| [95, 95, 95, 90, 95, 90] |
+--------------------------+
Got 1 rows (time spent 6424/14165 us)

Fri, 03 Feb 2023 16:12:05 CST

(root@nebula) [basketballplayer]> GO 1 TO 2 STEPS FROM "player100" OVER follow YIELD properties(edge).degree AS d | yield collect($-.d) AS d | yield standard_deviation($-.d)
+--------------------------+
| standard_deviation($-.d) |
+--------------------------+
| 2.357022603955158        |
+--------------------------+
Got 1 rows (time spent 9809/15682 us)

@wey-gu wey-gu changed the title commit udf UDF based on existing function infra Feb 13, 2023
@codecov-commenter
Copy link

Codecov Report

Base: 77.69% // Head: 78.61% // Increases project coverage by +0.91% 🎉

Coverage data is based on head (ba51c00) compared to base (a6d31b3).
Patch coverage: 25.21% of modified lines in pull request are covered.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4804      +/-   ##
==========================================
+ Coverage   77.69%   78.61%   +0.91%     
==========================================
  Files        1119     1120       +1     
  Lines       83665    83797     +132     
==========================================
+ Hits        65005    65878     +873     
+ Misses      18660    17919     -741     
Impacted Files Coverage Δ
src/common/function/FunctionManager.h 100.00% <ø> (ø)
src/common/function/FunctionUdfManager.cpp 22.12% <22.12%> (ø)
src/common/function/FunctionManager.cpp 78.84% <83.33%> (+0.01%) ⬆️
src/common/memory/MemoryTracker.cpp 60.00% <0.00%> (-16.00%) ⬇️
src/common/memory/Memory.h 72.34% <0.00%> (-10.27%) ⬇️
src/common/memory/NewDelete.cpp 56.25% <0.00%> (-6.25%) ⬇️
src/graph/executor/StorageAccessExecutor.h 50.00% <0.00%> (-4.35%) ⬇️
src/common/thread/GenericWorker.h 80.76% <0.00%> (-3.85%) ⬇️
src/storage/transaction/TransactionManager.cpp 39.45% <0.00%> (-2.71%) ⬇️
src/storage/mutate/AddVerticesProcessor.cpp 86.07% <0.00%> (-2.11%) ⬇️
... and 68 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@xtcyclist xtcyclist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contribution on udf is very much appreciated.

Copy link
Contributor

@jievince jievince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job

@xtcyclist xtcyclist merged commit 26bec49 into vesoft-inc:master Feb 13, 2023
@wey-gu
Copy link
Contributor

wey-gu commented Feb 13, 2023

Thanks a lot @zhaojunnana !

@abby-cyber abby-cyber added doc affected PR: improvements or additions to documentation and removed doc affected PR: improvements or additions to documentation labels Mar 30, 2023
@wey-gu
Copy link
Contributor

wey-gu commented Mar 30, 2023

@abby-cyber 发现我用错了名字,我给的例子是标准差,不是方差,我已经编辑了,另外就是如电话说的, make 的部分参考最新的pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support user-defined functions
8 participants